Non-Linear Compression: Gzip Me Not!
نویسندگان
چکیده
Most compression algorithms used in storage systems today are based on an increasingly outmoded sequential processing model. Systems wishing to decompress blocks out-of-order or in parallel must reset the compressor’s state before each block, reducing adaptiveness and limiting compression ratios. To remedy this situation, we present Non-Linear Compression, a novel compression model enabling systems to impose an arbitrary partial order on inter-block dependencies. Mutually unordered blocks may be compressed and decompressed out-oforder or in parallel, and a compressor can adaptively compress each block based on all causally prior blocks. This graph structure captures the system’s data dependencies explicitly and completely, enabling the compressor to adapt using long-lived state without the constraint of sequential processing. Preliminary experiences with a simple Huffman compressor suggest that non-linear compression fits a diverse set of storage applications.
منابع مشابه
Lossless and Near-lossless Compression of Ecg Signals
In this paper we present Linear Transformation Algorithm (LTA), which is based on a new transformation, Linear Block Transformation (LOT). Experimental results show that Linear Transformation Algorithm yields comparable results to Burrows-Wheeler Algorithm (BWA) [4] and outperforms Gzip, and Shorten Waveform Coder for nearlossless ECG compression; for lossless ECG compression it yields better c...
متن کاملThe Effect of Non-Greedy Parsing in Ziv-Lempel Compression Methods
Most practical compression methods in the LZ77 and LZ78 families parse their input using a greedy heuristic. However the popular gzip compression program demonstrates that modest but significant gains in compression performance are possible if non-greedy parsing is used. Practical implementations for using non-greedy parsing in LZ77 and LZ78 compression are explored and some experimental measur...
متن کاملLFQC: a lossless compression algorithm for FASTQ files
MOTIVATION Next Generation Sequencing (NGS) technologies have revolutionized genomic research by reducing the cost of whole genome sequencing. One of the biggest challenges posed by modern sequencing technology is economic storage of NGS data. Storing raw data is infeasible because of its enormous size and high redundancy. In this article, we address the problem of storage and transmission of l...
متن کاملMFCompress: a compression tool for FASTA and multi-FASTA data
MOTIVATION The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed...
متن کاملA General Compression Scheme for Databases
Compression of databases not only achieves a reduction in storage space but can reduce overall retrieval times. Current schemes such as gzip and compress are impractical for the purposes of databases as they do not allow individual records to be retrieved. A recent compression scheme, sequitur, allows quick decompression of any individual section of the database, however it uses extravagant amo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012